Consumer credit risk: Individual probability estimates using machine learning

نویسندگان

  • Jochen Kruppa
  • Alexandra Schwarz
  • Gerhard Arminger
  • Andreas Ziegler
چکیده

Consumer credit scoring is often considered a classification task where clients receive either a good or a bad credit status. Default probabilities provide more detailed information about the creditworthiness of consumers, and they are usually estimated by logistic regression. Here, we present a general framework for estimating individual consumer credit risks by use of machine learning methods. Since a probability is an expected value, all nonparametric regression approaches which are consistent for the mean are consistent for the probability estimation problem. Among others, random forests (RF), k-nearest neighbors (kNN), and bagged k-nearest neighbors (bNN) belong to this class of consistent nonparametric regression approaches. We apply the machine learning methods and an optimized logistic regression to a large dataset of complete payment histories of short-termed installment credits. We demonstrate probability estimation in Random Jungle, an RF package written in C++ with a generalized framework for fast tree growing, probability estimation, and classification. We also describe an algorithm for tuning the terminal node size for probability estimation. We demonstrate that regression RF outperforms the optimized logistic regression model, kNN, and bNN on the test data of the short-term installment credits. 2013 Elsevier Ltd. All rights reserved.

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Consumer credit-risk models via machine-learning algorithms

We apply machine-learning techniques to construct nonlinear nonparametric forecasting models of consumer credit risk. By combining customer transactions and credit bureau data from January 2005 to April 2009 for a sample of a major commercial bank’s customers, we are able to construct out-of-sample forecasts that significantly improve the classification rates of credit-card-holder delinquencies...

متن کامل

An Evaluation of Support Vector Machines in Consumer Credit Analysis

This thesis examines a support vector machine approach for determining consumer credit. The support vector machine using a radial basis function (RBF) kernel is compared to a previous implementation of a decision tree machine learning model. The dataset used for evaluation was provided by a large bank and includes relevant consumer-level data, including transactions and credit-bureau data. The ...

متن کامل

Paper 1323-2017: Real AdaBoost: Boosting for Credit Scorecards and Similarity to WOE Logistic Regression

Adaboost is a machine learning algorithm that builds a series of small decision trees, adapting each tree to predict difficult cases missed by the previous trees and combining all trees into a single model. We will discuss the AdaBoost methodology and introduce the extension called Real AdaBoost. Real AdaBoost comes from a strong academic pedigree: its authors are pioneers of machine learning a...

متن کامل

Modelling the credit risk for portfolios of consumer loans: Analogies with corporate loan models

The Internal Ratings Based (IRB) approach suggested in the New Basel Accord regulations (BIS 2005) uses a capital allocation formula derived from a Merton style structural model of the credit risk of portfolios of corporate loans. Yet this formula is being applied in the case of consumer loans as well as corporate loans. This has highlighted that although there are a number of well established ...

متن کامل

Data mining with Support Vector Machine

Machine Learning is considered as a subfield of Artificial Intelligence and it is concerned with the development of techniques and methods which enable the computer to learn. In this paper introduce SVM. It is techniques and methodologies developed for machine learning tasks Support vector machines (SVMs) are a set of related supervised learning methods used for classification and regression. S...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

عنوان ژورنال:
  • Expert Syst. Appl.

دوره 40  شماره 

صفحات  -

تاریخ انتشار 2013